113 research outputs found

    PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting

    Full text link
    Crowd counting, i.e., estimating the number of people in a crowded area, has attracted much interest in the research community. Although many attempts have been reported, crowd counting remains an open real-world problem due to the vast scale variations in crowd density within the interested area, and severe occlusion among the crowd. In this paper, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, that leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilizes these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. We also address the variation of crowdedness levels among different images with an exclusive Density-Aware Decoder (DAD). For this purpose, a classifier evaluates the density level of the input features and then passes them to the corresponding high and low crowded DAD modules. Finally, we generate an overall density map by considering the summation of low and high crowded density maps as spatial attention. Meanwhile, we employ two losses to create a precise density map for the input scene. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed PDANet in terms of the accuracy of counting and generated density maps over the well-known state of the arts

    Cross-Modal Contrastive Learning for Robust Reasoning in VQA

    Full text link
    Multi-modal reasoning in visual question answering (VQA) has witnessed rapid progress recently. However, most reasoning models heavily rely on shortcuts learned from training data, which prevents their usage in challenging real-world scenarios. In this paper, we propose a simple but effective cross-modal contrastive learning strategy to get rid of the shortcut reasoning caused by imbalanced annotations and improve the overall performance. Different from existing contrastive learning with complex negative categories on coarse (Image, Question, Answer) triplet level, we leverage the correspondences between the language and image modalities to perform finer-grained cross-modal contrastive learning. We treat each Question-Answer (QA) pair as a whole, and differentiate between images that conform with it and those against it. To alleviate the issue of sampling bias, we further build connected graphs among images. For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning. To our best knowledge, it is the first paper that reveals a general contrastive learning strategy without delicate hand-craft rules can contribute to robust VQA reasoning. Experiments on several mainstream VQA datasets demonstrate our superiority compared to the state of the arts. Code is available at \url{https://github.com/qizhust/cmcl_vqa_pl}

    BAVS: Bootstrapping Audio-Visual Segmentation by Integrating Foundation Knowledge

    Full text link
    Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding sources by predicting pixel-wise maps. Previous methods assume that each sound component in an audio signal always has a visual counterpart in the image. However, this assumption overlooks that off-screen sounds and background noise often contaminate the audio recordings in real-world scenarios. They impose significant challenges on building a consistent semantic mapping between audio and visual signals for AVS models and thus impede precise sound localization. In this work, we propose a two-stage bootstrapping audio-visual segmentation framework by incorporating multi-modal foundation knowledge. In a nutshell, our BAVS is designed to eliminate the interference of background noise or off-screen sounds in segmentation by establishing the audio-visual correspondences in an explicit manner. In the first stage, we employ a segmentation model to localize potential sounding objects from visual data without being affected by contaminated audio signals. Meanwhile, we also utilize a foundation audio classification model to discern audio semantics. Considering the audio tags provided by the audio foundation model are noisy, associating object masks with audio tags is not trivial. Thus, in the second stage, we develop an audio-visual semantic integration strategy (AVIS) to localize the authentic-sounding objects. Here, we construct an audio-visual tree based on the hierarchical correspondence between sounds and object categories. We then examine the label concurrency between the localized objects and classified audio tags by tracing the audio-visual tree. With AVIS, we can effectively segment real-sounding objects. Extensive experiments demonstrate the superiority of our method on AVS datasets, particularly in scenarios involving background noise. Our project website is https://yenanliu.github.io/AVSS.github.io/

    Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

    Full text link
    In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels

    Machine intelligence for nerve conduit design and production

    Get PDF
    Nerve guidance conduits (NGCs) have emerged from recent advances within tissue engineering as a promising alternative to autografts for peripheral nerve repair. NGCs are tubular structures with engineered biomaterials, which guide axonal regeneration from the injured proximal nerve to the distal stump. NGC design can synergistically combine multiple properties to enhance proliferation of stem and neuronal cells, improve nerve migration, attenuate inflammation and reduce scar tissue formation. The aim of most laboratories fabricating NGCs is the development of an automated process that incorporates patient-specific features and complex tissue blueprints (e.g. neurovascular conduit) that serve as the basis for more complicated muscular and skin grafts. One of the major limitations for tissue engineering is lack of guidance for generating tissue blueprints and the absence of streamlined manufacturing processes. With the rapid expansion of machine intelligence, high dimensional image analysis, and computational scaffold design, optimized tissue templates for 3D bioprinting (3DBP) are feasible. In this review, we examine the translational challenges to peripheral nerve regeneration and where machine intelligence can innovate bottlenecks in neural tissue engineering

    The Photodynamic Effect of Different Size ZnO Nanoparticles on Cancer Cell Proliferation In Vitro

    Get PDF
    Nanomaterials have widely been used in the field of biological and biomedicine, such as tissue imaging, diagnosis and cancer therapy. In this study, we explored the cytotoxicity and photodynamic effect of different-sized ZnO nanoparticles to target cells. Our observations demonstrated that ZnO nanoparticles exerted dose-dependent and time-dependent cytotoxicity for cancer cells like hepatocellular carcinoma SMMC-7721 cells in vitro. Meanwhile, it was observed that UV irradiation could enhance the suppression ability of ZnO nanoparticles on cancer cells proliferation, and these effects were in the size-dependent manner. Furthermore, when ZnO nanoparticles combined with daunorubicin, the related cytotoxicity of anticancer agents on cancer cells was evidently enhanced, suggesting that ZnO nanoparticles could play an important role in drug delivery. This may offer the possibility of the great potential and promising applications of the ZnO nanoparticles in clinical and biomedical areas like photodynamic cancer therapy and others
    • …
    corecore